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Abstract. An early phase clinical trial is the first step in evaluating 
the effects in humans of a potential new anti-disease agent or combi- 
nation of agents. Usually called "phase I" or "phase I /II" trials, these 
experiments typically have the nominal scientific goal of determining an 
acceptable dose, most often based on adverse event probabilities. This 
arose from a tradition of phase I trials to evaluate cytotoxic agents 
for treating cancer, although some methods may be applied in other 
medical settings, such as treatment of stroke or immunological diseases. 
Most modern statistical designs for early phase trials include model- 
based, outcome-adaptive decision rules that choose doses for successive 
patient cohorts based on data from previous patients in the trial. Such 
designs have seen limited use in clinical practice, however, due to their 
complexity, the requirement of intensive, computer-based data moni- 
toring, and the medical community's resistance to change. Still, many 
actual applications of model-based outcome-adaptive designs have been 
remarkably successful in terms of both patient benefit and scientific 
outcome. In this paper I will review several Bayesian early phase trial 
designs that were tailored to accommodate specific complexities of the 
treatment regime and patient outcomes in particular clinical settings. 

Key words and phrases: Adaptive design, Bayesian design, clinical 
trial, dose-finding, phase I trial, phase I/II trial. 



1. INTRODUCTION 

1.1 An Early Phase Trial 

Clinical trials are much more complex than typ- 
ical statistical designs may indicate. An example 
is a phase I stem cell transplantation (SCT) trial 
in which the continual reassessment method (CRM, 
O'Quigley, Pepe and Fisher, 1990; O'Quigley, 1990) 
was applied to optimize the per-administration dose 
(PAD) of gemcitabine, da, when added to an estab- 
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lished two-agent preparative regimen consisting of 
intravenous busulfan and melphalan (Andersson et 
ah, 2002). The design was used for each of two sepa- 
rate, parallel trials, one for allogeneic transplant (al- 
lotx), which uses stem cells from a matched donor, 
and one for autologous transplant (autotx), which 
uses the patient's own stem cells. For each patient, 
during the period from day —10 to day —1 preced- 
ing the SCT on day 0, each of the three agents was 
given on two or more days using a particular sched- 
ule and PAD. Previously, a six-day schedule of addi- 
tional gemcitabine had been tried, but it was found 
to be too toxic, so in this trial each patient's as- 
signed do was given in a two-day schedule, on each 
of days —8 and —3, for total gemcitabine dose 2dc- 
"Toxicity" was defined to be any regimen-related 
grade 4 or 5 adverse event (AE) occurring within 
30 days post transplant and affecting a vital organ, 
but excluding AEs that occur routinely in SCT, such 
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as marrow suppression and, in allotx, graft-versus- 
host disease (GVHD). Using the usual CRM crite- 
rion, the design's nominal goal was to find do from 
a predetermined set of 10 PADs ranging from 225 
to 3675 mg/m 2 having toxicity probability, tt((1q,8), 
with posterior mean Fiq{tt(cIg, 6)\data} closest to the 
target 0.10, where 6 denotes the model parameters. 
The principal investigator (PI) specified the conser- 
vatively low target 0.10 in part due to the previ- 
ous negative experience with the six-day schedule, 
and also because 0.10 is consistent with the toxi- 
city rate of the established two-agent regimen. In 
each subgroup, gemcitabine doses were to be chosen 
for successive cohorts of 3 patients, up to a maxi- 
mum of 36 patients, with the safety rules that no 
untried dose could be skipped when escalating and 
accrual to the subgroup would be stopped if the low- 
est dose dc = 255 was unacceptably toxic, formally 
if Pr{vr(225, 9) > 0.W\data} > 0.80. 

Because clinical trials are medical experiments with 
human subjects, they often do not play out precisely 
as designed. In the course of this trial: (i) when 
no toxicity was seen in the first 24 patients the PI 
decided to change the two-day gemcitabine sched- 
ule (—8, —3) to the three-day schedule (—8, —6, —3) 
while maintaining the same total dose by giving %do 
on each day, (ii) this three-day schedule was quickly 
found to cause severe skin toxicity in the first few 
patients who received it and accrual was suspended, 
and (iii) we re-designed the trial again by returning 
to the two-day schedule, but (iv) at the Pi's request 
we also expanded the set of possible do values. After 
28 allotx patients had been treated and fully evalu- 
ated, however, (v) concern about observed grade 3 
mucositis and skin toxicities seen at higher dose lev- 
els caused the physicians to expand the definition of 
"toxicity" to include these events, which previously 
had been excluded if they could be resolved thera- 
peutically within two weeks. Along with this change, 
they also decided to change the CRM target from 
0.10 to 0.15. These last changes had the combined 
effect of substantially increasing the numbers of pa- 
tients with "toxicity" among those treated at higher 
dose levels and greatly reducing the value of da rec- 
ommended by the CRM. Per standard regulatory 
procedure, it was necessary to obtain institutional 
review board approval for each change in the de- 
sign. So, this trial actually was designed five times, 
it evaluated effects of a combination of three agents 
given in overlapping pre-transplant schedules, both 
the dose and schedule of gemcitabine were varied 



adaptively during the trial using three different for- 
mulations of the CRM to choose gemcitabine PADs 
and ad hoc decisions for changing schedules, there 
were two simultaneous trials involving different SCT 
modalities, the dominant effects on toxicity were 
both do and the schedule of gemcitabine, and the 
definition of toxicity was changed near the end of the 
trial to be more inclusive and thus obtain a more 
protective dose selection criterion. Overshadowing 
all of this were the actual goals, which were not only 
to control toxicity but also to reduce the rate and 
severity of GVHD in the allotx patients and to im- 
prove the rates of engraftment and 100-day survival, 
compared to the established preparative regimen. 

1.2 Some Generalities 

Denote the treatment administered to a given pa- 
tient by x. In the designs discussed here, x will be 
the dose of an agent, the dose pair of two agents 
given together, or a (schedule, dose) combination 
consisting of a finite sequence of administration times 
and corresponding doses. Actual patient outcome in 
oncology trials is very complex, often including nu- 
merous different types of toxicity scored on ordinal 
scales of severity (grade), disease status scored as a 
binary or ordinal variable, with each often recorded 
at several successive evaluations, as well as the times 
of delay or discontinuation of treatment, drop-out 
or death. In sharp contrast, the outcome Y used for 
statistical decision-making during the trial usually 
is defined to be a single variable or possibly a vec- 
tor of two variables. In the examples given here, the 
designs assume that Y is, respectively, a single bi- 
nary toxicity indicator, a vector of two binary in- 
dicators of toxicity and efficacy, a vector of ordinal 
toxicities or a time-to-toxicity variable subject to 
right censoring. The model consists of a probability 
density function (p.d.f.) or mass function (p.m.f.) 
f(y\x,6) of Y for a patient who receives treatment 
x, and a prior p(0\£), where 6 is the model param- 
eter vector and £ are fixed hyperparameters. The 
data observed from the first n patients in the trial 
are V n = {(x^ ,Y^), . . . , (x^ n \Y^)}, with likeli- 
hood C n (V n \9) =T\2 =1 f(Y®\x®,9) and posterior 
Pn (9\V n ,O^C n (V n \e)p(9\0. 

All of the designs that I will discuss here utilize 
Bayesian "learn-as-you-go" decision rules to choose 
x from the set of possible treatments, X, based on 
the posterior p n (9\T> n ) computed from the most re- 
cent data available when a new patient is enrolled. 
Such a sequentially adaptive decision algorithm may 



BAYESIAN DESIGNS FOR EARLY PHASE TRIALS 



3 



be expressed as a sequence a = {a n } of functions 
a n : T> n — > X U <f>, where (f> denotes the empty set and 
a n (T> n ) = <j) means "Do not treat with any x £ X. n 
In general, {a n } may include several adaptive deci- 
sion rules used together, such as rules for choosing a 
dose, a dose pair or a (schedule, dose) combination, 
for temporarily suspending accrual to wait for ad- 
ditional data on previously treated patients, or for 
stopping the trial early because no x € X is accept- 
able. The (ra + l)st iteration of the Bayesian medical 
decision-making process may be described by the se- 
quence of mappings 

> X n +1 

(1) 

> Yn+1 > T^n+l 

in which Bayes' Theorem uses the assumed proba- 
bility model (f,p) to map the observed data into a 
posterior, the decision rules a n use this to choose 
the next treatment x n+ ±, the patient is treated, the 
outcome Y n+ \ is observed, and (x n+ i,Y n+ i) are in- 
corporated into the data. This process is repeated 
until the end of the trial, which may be when ei- 
ther a maximum sample size N max or trial dura- 
tion T max is reached, or because the trial is stopped 
early. The process whereby the expanding data set is 
used by applying Bayes' Theorem repeatedly to turn 
Pn(6\D n ,C) into p n+1 (0|Z? n+ i,£) may be called "it- 
erative Bayesian learning," in that one learns about 
9 as additional data are observed during the trial. 
In the sequel, to simplify notation I will suppress 
dependence of the posterior on the prior hyperpa- 
rameters £. 

A design consists of the trial's entry criteria, treat- 
ments X, set of possible patient outcomes, probabil- 
ity model (f,p), decision rules a, N max or T max , and 
possibly a cohort size, c. Since a n acts on T> n indi- 
rectly through p n (9\T> n ) in Bayesian adaptive de- 
signs, evaluation of a design's properties must ac- 
count for the fact that {p n (6\T> n )} is a sequence of 
statistics. The complexity of the process summa- 
rized by (1), even for binary Y and a single dose 
x, has motivated the routine use of computer sim- 
ulation under each of a set of assumed "true" / as 
a tool to evaluate the frequentist operating charac- 
teristics (OCs) of the design for various a's. This is 
used as a basis for choosing decision rules, calibrat- 
ing design parameters, and possibly calibrating the 
prior. There is nothing "non-Bayesian" about using 
frequentist OCs of a Bayesian design to adjust the 



prior and design parameters. On the contrary, be- 
cause simulating a trial that is based on a Bayesian 
design allows the physician to better understand the 
consequences of particular prior values, simulation 
provides a tool for the physician to modify his/her 
prior so that it more accurately reflects what the 
physician actually believes. It also is important to 
examine the prior's properties in the natural param- 
eter domain, such as the probability of toxicity at 
dose x, ir(x,6), rather than in terms of elements of 
8 that may have no intuitive meaning to a physician. 
One should also examine the first few decisions a n 
for each of several possible configurations of data, 
in order to avoid a prior that does not make sense. 
This is especially important for evaluating decisions 
that must be made early in the trial based on very 
little data, such as choosing the second cohort's dose 
based on data from the first cohort of three patients. 
The prior always has consequences in an early phase 
trial, regardless of how "uninformative" it may ap- 
pear to be. 

If one does not wish to use simulation design 
tool, the most common alternative approach is to 
first specify a formal optimality criterion and solve 
for a mathematically (cf. Haines, Perevozskaya and 
Rosenberger, 2003; Dette et al., 2008). However, the 
simulation-based OCs of a design obtained using a 
particular optimality criterion are often surprising, 
essentially because such a design's properties are a 
consequence of the optimality criterion used. One 
may maximize information, minimize the variance 
of a particular estimated quantity, minimize mean 
or maximum sample size, control false positive or 
other incorrect decision probabilities, minimize ex- 
pected financial costs, minimize expected trial du- 
ration, optimize outcomes for patients in the trial 
or for future patients, etc. Since, unavoidably, such 
goals often are at odds with each other, use of the 
word "optimal" without qualification may be very 
misleading. 

1.3 Some Practical Issues 

Actual clinical trial logistics can be quite com- 
plex. While the CRM adaptively chooses a new dose 
from a continuum for each new patient, Goodman, 
Zahurak and Piantadosi (1995) proposed the prac- 
tical modifications of choosing doses for successive 
cohorts of several patients and limiting doses to a 
finite set. In most of the outcome adaptive dose- 
finding applications that I have seen, each newly 
chosen x is given to a cohort. Moreover, a "do not 
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skip" safety rule often is imposed that does not al- 
low an untried dose to be skipped when escalating. 
While limiting doses to a finite set of discrete values 
usually does not allow the exact MTD to be chosen, 
the difference between the chosen dose and the true 
MTD may be small, provided that the chosen sizes 
doses are reasonable. For example, using the dose 
set {100, 200, 400, 800, 1600} must miss an actual 
MTD of 600 by at least 200, which is larger than the 
difference between the first two doses. Moreover, es- 
calation from 400 to 800 or from 800 to 1600 may 
be unsafe, regardless of whether a do-not-skip rule 
is imposed. 

For example, if a cohort size c = 3 is used, then 
a n chooses x n+1 = x n+2 = x n+3 and, formally, Y n+1 , 
Y n+ 2 and 1^+3 all must be observed before updat- 
ing the data and making the next decision. How- 
ever, if Y n+ \ has been fully evaluated at the time 
patient n + 2 is accrued, then x n +2 may be cho- 
sen more reliably by applying the decision criterion 
based on the updated posterior incorporating the 
data (x n +i,Y„+i) from patient n + 1; similarly, if 
Y n+ \ and Y n+ 2 are known when patient n + 3 is ac- 
crued, their data may be included to choose x n+ %. 
A simple approach that works surprisingly well is to 
use the "look ahead" rule: If the possible outcomes 
of treated patients for whom Y has not yet been fully 
evaluated will not alter the chosen x for the next 
patient, then treat the next patient with x without 
delay (Thall et al., 1999). This is closely related to 
the general fact that the time window required to 
evaluate Y per its definition and the accrual rate 
together play critically important roles in trial con- 
duct. For example, if Y = I (toxicity within 3 months 
from start of therapy) and the accrual rate is 6 pa- 
tients per month, then any outcome-adaptive rule 
based on Pi(Y = l\x,6) is virtually useless, since a 
large number of patients will be treated before the 
rule may be applied. Some possible ways to imple- 
ment an outcome-adaptive design in such settings 
are as follows: (i) use c = 1 but enroll only a very 
small proportion of eligible patients in the trial, (ii) 
use c = 3 or larger with accrual suspended between 
cohorts, but use the look-ahead rule to improve lo- 
gistical feasibility, or (iii) redefine the outcome to be 
time-to-toxicity, but use a safety rule that may delay 
accrual interimly to allow the data from previously 
treated patients to mature (cf. Bekele et al., 2008). 

At the start of the trial, when n = 0, the first 
treatment x\ may be chosen by applying the de- 
cision rule «o based on the prior p(0\£). Methods 



for choosing a starting dose have been proposed by 
Goodman, Zahurak and Piantadosi (1995) and Che- 
ung (2005). Most commonly, x\ is chosen by the 
physician based on the nature of x, the definition of 
Y, the trial's entry criteria and clinical experience 
treating the disease. For example, a trial enrolling 
prostate cancer patients with a life expectancy of six 
years is very different from a trial enrolling brain tu- 
mor patients with a life expectancy of six months. 
Similarly, depending on the trial's entry criteria and 
treatment, "toxicity" may be defined as anything 
from severe fatigue to regimen-related death. It thus 
makes sense, during the prior elicitation process, to 
calibrate p{9) so that ao agrees with the physician's 
xi, since the motivation for choosing a particular x\ 
is based on prior experience. 

In this paper I will review several designs that 
focus on the problem of reflecting more fully partic- 
ular complexities of (x,Y). Each design addresses 
some, but not all, of the issues in the SCT trial de- 
scribed earlier. These methods were motivated by 
problems that I have encountered during the pro- 
cess of designing early phase trials over the past 
19 years. Each design was developed by a collab- 
orative team including one or more physicians, one 
or more statisticians and a computer programmer. 
Each may be called a "phase I" design in that dose- 
finding is based on toxicity, or a "phase I/II" de- 
sign in that dose-finding is based on both efficacy 
and toxicity, with the exception of the design de- 
scribed in Section 5 that jointly optimizes schedule 
and dose (Braun et al., 2007). While it is tempting 
to think that a "one-size-fits-all" design encompass- 
ing all phase I/II possibilities may be constructed, 
in my experience clinical research is far too com- 
plex to do this, and each new trial design problem 
often has unique aspects that require a new model 
or method. The particular data structure, probabil- 
ity model and decision rules that should be used to 
design a clinical trial are best determined through 
careful discussion with the physicians planning the 
trial, and must strike a compromise between the de- 
sire to accurately reflect the medical process and ad- 
dress scientific goals while accommodating the prac- 
tical realities of trial conduct. 

I will not discuss methods for eliciting and cal- 
ibrating priors, since this topic could easily fill an 
additional manuscript. I will not explore the eth- 
ical aspects of adaptive decision rules either, since 
they also are quite complex (cf. Palmer, 2002). Early 
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phase trial design and conduct are difficult and com- 
plicated in large part due to the tension between 
optimizing the benefit and safety of patients treated 
in the trial, and learning about the effects of each 
x on Y to benefit future patients, as well as eco- 
nomic constraints and regulatory requirements. In 
this regard, a statistician constructing an adaptive 
design should be mindful of the ethical issues regard- 
ing what happens, for example, to patient number 
7 because (s)he was treated with xq based on how 
«6 acted on p§(6\D§). 

2. DOSE-FINDING FOR TWO-AGENT 
COMBINATIONS 

2.1 Outcomes and Models 

Thall et al. (TMML, 2003) proposed a method 
for determining one or more acceptable dose pairs 
x = (xi,X2) of two cytotoxic agents given together, 
based on a binary indicator Y of toxicity. To sta- 
bilize the model numerically, each dose is standard- 
ized so that < x\, X2 < 1, for example, by dividing 
each raw dose by some maximum value, so that each 
x G X = [0, l] 2 . The probability model for toxicity is 

Pr(Y~ = l\x,9) 

(2) =7T(S,0) 

J 



J1P3J2P3 



.1, 



1 + aixf 1 + a2^2 2 + a^x 1 ^ 1 

where 6 = (a±, j3\, 012, @2, «3, /?3). All elements of 9 
are positive valued, which ensures that ir(x,6) is a 
probability and that it is increasing in each entry 
of x. Denoting the subvectors 9j = (oij,f3j) for j = 
1,2,3, so that 9 = (61,62,93), the model (2) contains 
the submodels 



J-2 



JlPz x p2Pi 



(3) 



7ri(zi,0i) = Pr{y = l\x = ( Xl ,0),9i} 



Pi 
a\x x 



1 + a\Xi 

which is the probability of toxicity when agent 1 is 
given alone at dose x\ and, similarly, 

7r 2 (x 2 ,0 2 ) = Pr{Y = l\x = (0,x 2 ),9 2 } 

(4) 

ft 

0*2X2 



1 + a 2 x p 2 2 



for agent 2 given alone at dose x 2 . Since OLjX^ = 
exp{log(aj) + /31og(:rj)}, (3) and (4) are logistic mod- 
els in a log standardized dose. TMML assume that 



there is clinical experience with each single agent 
when used alone, since this often is a requirement 
before investigating a combination in humans. Since 
9j parameterizes nj(xj,9j) for j = 1,2 and 9% pa- 
rameterizes interaction between the two agents, a 
key element of TMML's approach is that the priors 
p(6i\£i) and ^(^2^2) are informative while 15(^3^3) 
is vague. Assuming gamma priors on the elements 
of 6 for tractability, TMML provide a detailed algo- 
rithm for eliciting p(6q|£i) and ^(^2^2)) although if 
historical data are available, the posteriors from pre- 
liminary fits of such data may be used as these pri- 
ors for trial design and conduct. Considering tt(x,6) 
geometrically as a response surface over the domain 
[0, l] 2 , this says that there is substantial prior knowl- 
edge about each of the two lines {x : X2 = 0} and 
{x : x\ = 0} on the edges of the response surface, but 
otherwise little is known about the surface, so it is 
like a sheet tied down at two edges but otherwise 
varying freely. In particular, the meaning of 6\ in 
the model n(x, 9) = ir(x, (9\,92,9z)) is very different 
from its meaning in the submodel 7Ti(xi,#i). This 
is underscored by the prior effective sample sizes 
computed by Morita, Thall and Mueller (2008) for 
the gamma priors given by TMML (2003, Section 
3), which are 547.3 for pi(0i|fi), 756.8 for p2(02|6). 
0.01 for J>3 (^3 IC3) an d 1.5 for p(6\£). This says that, 
with respect to toxicity, a priori a lot is known about 
how each agent behaves when used alone, but almost 
nothing is known about how the two agents behave 
together. 




Fig. 1. Isocontours of a dose-toxicity probability surface for 
two dimensional dose x= (x±,X2)- 
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2.2 Decision Criteria 

The dose-finding method exploits the following ge- 
ometric structure on X = [0, l] 2 . For each p £ (0, 1), 
the set X p {6) = {x : 7r(x, 9) = p} is the isocontour of 
all dose pairs having toxicity probability p. Several 
isocontours for a particular fixed 9 are illustrated 
in Figure 1. Since X p (9) n X q (9) = § if p ^ q and 
Uo<p<i Xp(Q) = [0) I] 2 ) every pair x falls on a unique 
Xp(9) for some p. The interaction term a^x^^x^ 2 ^ 3 
in (2) is used instead of the simpler term a^x\xi in 
order to give the model sufficient flexibility to allow 
"S" shaped isocontours, as shown in Figure 1. 

The design proceeds in two stages. In stage 1, 
doses are chosen for successive cohorts from a fi- 
nite set of values on the predetermined fixed diag- 
onal line Li, shown as the straight line at approxi- 
mately 45° in Figure 1. The design is robust to the 
particular angle of L\, as long as it is not too far 
from 45°. Since the response surface n(x, 9) increases 
in each argument x\ and X2, n(x,6) must increase 
as x moves up L\ from lower left to upper right. 
Given target toxicity probability 7r*, dose- finding in 
stage 1 proceeds using the CRM criterion of choos- 
ing x for each cohort from the set on L\ to minimize 
|E{7r(x,6>)|£> n } — 7r*[, starting at the lowest dose in 
the set, not skipping untried doses when escalating, 
and adding additional doses to the set once the first 
toxicity is observed. That is, restricting x to L\ in 
stage 1 reduces dose selection to a one-dimensional 
problem, and a conventional CRM algorithm may 
be applied. Geometrically, one may think of stage 1 
as walking up and down the toxicity surface, along 
L±, looking for dose pairs with toxicity probability 

Close to 7T*. 

In stage 2, x is chosen for successive cohorts from 
the random isocontour 

(5) X K .(V n ) = {x:E{ir(x,6)\V n } = ir*}. 

Since X n * (T> n ) contains infinitely many x, an addi- 
tional criterion is needed to choose one x for each 
cohort in stage 2. TMML suggest two criteria, one 
based on the clinical criterion of "cancer killing po- 
tential" and the other the more usual statistical goal 
of maximizing Fisher Information. Denoting the el- 
ements of 9 by 6\, . . . , #6 f° r convenience, the Fisher 
Information matrix I(x, 9) for dose x has (j, k) entry 
{dir(x, 9)/d9 j }{dn(x, 9) / 39 k } /[ir{x, 0){\ - ir(x, 9)}}. 
Under the Bayesian model, x is chosen to maximize 
the posterior mean log determinant of I(x,9) given 
the current data, E[log{det I(x,9)}\V n ]. Doses are 



chosen for successive cohorts in stage 2 by alter- 
nating between the two subsets of X n * {V n ) to the 
left and right of L\. For each subset, the x op- 
timizing cancer killing is determined, the x max- 
imizing Fisher Information is determined, the av- 
erage of these two dose pairs is computed, and the 
x 6 X n * (T> n ) closest to this average is assigned to the 
cohort. At the end of the trial, any x € X n *(D n ) is 
a solution. Thus, for example, one may choose three 
final dose pairs on X n *(T>N), one on Li, one to the 
left of L\ and one to the right of L\, and randomize 
patients among these three x pairs in a subsequent 
phase II trial. 

In their illustrative application, TMML use a co- 
hort size of 2 with 60 patients divided into TVi = 20 
(10 cohorts) in stage 1 and N2 = 40 (20 cohorts) 
in stage 2. Simulations show that using (Ni^Nz) 
either (30, 30) or (40, 20) gives a design with in- 
ferior properties compared to (20, 40). Cohort-by- 
cohort computations show that the target isocon- 
tour X n *(D n ) varies substantially with each new co- 
hort's data even after n = 30 or 40 patients, but sta- 
bilizes by n = 50 or 60. This is the case essentially 
because a binary outcome is a very small amount 
of information per patient. For total sample size 
N = N\ + N2 = 60, however, the method is quite 
reliable in terms of choosing dose pairs that have 
true toxicity probability close to tt* . 

3. USING BOTH EFFICACY AND TOXICITY 

From a clinical perspective, the primary purpose 
of treatment is to fight disease, and safety is never 
a secondary concern in any medical setting. Thus, 
both scientifically and medically, both efficacy and 
toxicity matter at all stages of clinical investigation. 
In many dose-finding trials, the entry criteria spec- 
ify patients with such poor prognosis that response 
is very unlikely, between 4% and 10%. In such set- 
tings, targeting even a low response rate 7Tr* = 0.10 
or 0.20 and using a phase II type rule to stop ac- 
crual if the observed response rate is likely to be 
below 7Tr* at any acceptable dose may be imprac- 
tical, since few or no responses are expected. This 
is the most common rationale for conducting phase 
I based on toxicity alone, while recording data on 
biological effects and possible clinical anti-disease 
effects. However, actual response rates vary widely 
between phase I trials, and many have complete or 
partial response rates well over 20% (cf. Horstmann 
et al., 2005). Moreover, patients enroll in a phase 
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I trial motivated by the hope that the new treat- 
ment will achieve an anti-disease effect, not simply 
the desire that no toxicity will occur. 

3.1 Outcomes and Models 

These considerations lead to the idea that, when 
it is realistic to target a response rate of 10% or 
larger, dose-finding should be done using a phase 
I/II design based on both E = efficacy and T = 
toxicity. Many phase I/II designs have been pro- 
posed (Gooley et al., 1994; O'Quigley, Fenton and 
Hughes, 2001; Braun, 2002; Ivanova, 2003). The fol- 
lowing phase I/II methodology, "EffTox," is based 
on the developments given by Thall and Russell 
(1998) and Thall and Cook (2004). Illustrations are 
given by Thall, Cook and Estey (2006) and Whe- 
lan et al. (2008). Patient outcome may be either 
a three-category or bivariate binary variable. The 
former case applies when E and T are defined in 
such a way that they are disjoint but E ^ T c , so 
that Y takes on values in {E, T, N} where N = 
(E U T) c = {no response and no toxicity}. This is 
appropriate if, for example, toxicity is irreversible 
organ damage or regimen-related death. When it is 
possible for both E and T to occur, the outcome is 
bivariate binary, Y = (Ye,Yt), where Yj, indicates 
the outcome k = E,T. For either case, denote the 
outcome probabilities for a patient given dose x by 
tt(x,9) = (it e (x,0),ttt(x,9)). 

For the trinary outcome case, the three-parameter 
model used by Thall and Russell (1998) is motivated 
by the idea that the three outcomes are ordered 
in the sense that N < E < T, with the idea that 
higher x is more likely to push the patient's out- 
come upward along this scale. The model is given 
by 7T T (X, 9) = g~ l {fo + fox) and ir E (x, 6) = g~ x (fo + 
fo + fox) — itt(x,9), where g is a link function, 6 = 
(fo, fo,fo) and fo,fo > 0. This model forces ir E (x, 9) 
to be very nonmonotone in x. A more flexible four- 
parameter model (Thall and Cook, 2004, Section 3) 
is given by Pr(£|T c , x, 9) = g^(/3 Et o + Pe,ix) and 
n T (x,9) = ff _1 (y9 T ,o + Pt,ix), where 9 = (Pe,o,Pe,i, 
Pt,o,Pt,i) and Pe,i,Pt,i > 0. Using this model, 
ir E (x,9) = Pt(E\T°,x,0){1 - ir T (x,9)}. For either 

model, f(Y\x,9) = U y=E ,N,T{ 7T y( x ^y iY=V) - 

For the bivariate binary case, the model must spec- 
ify the four elementary outcome probabilities ir a b(x, 
9) = Fi{Y E = a,Y T = b\x,9) for a, b € {0, 1}. The 
p.m.f. of a patient treated at dose x is f(Y\x,9) = 

nL nLoK,&(^)} /(yE=a ' yT=fe) - Th e general ap- 
proach used by Thall and Cook (2004) and Thall, 



Nguyen and Estey (2008) is to first specify the two 
marginal dose-outcome distributions 7Tk(x,9) = 
g~ 1 {r]k(x, 9)}, in terms of link function g and linear 
terms rjk(x,9) for k = E and T, and then define the 
joint distribution in terms of the marginals. Tem- 
porarily suppressing x and 9, tt^j, is determined by 
(a,b,ir E ,7TT,ip), where ip is an association param- 
eter. This may be done tractably using a Gumbel 
distribution, 

7Ta,6 = K E 0- - Tr E ) l ~ a 7TT(l - TTt) 1 ~ >> 

(6) +{-l) a+ \ E (l-ir E ) 

•vr T (l-vr T )(^), 

with ip real- valued, or a Gaussian copula, C(u,v) = 
^($ _1 («),$ _1 (i))), for < u,v < 1, where <I>^ is 
the bivariate standard normal c.d.f. with correlation 
ifj and $ is the univariate N(0, 1) c.d.f. Under this 
copula, 7r ,o = ^(^ _1 (1 - tte:),^' 1 ^ - ir T )) with 
71-1,0 = I - 7r T - 7r 0i o, and 7ri,i = ir E + 7r T + 7r ,o - 
1. If g is the probit link, tt^ = $(%) and 7ro,o = 
$f(-VE,-VT)- 

A major practical issue is that the r]k(x, 9)'s should 
be realistic but the model must be numerically trac- 
table, to facilitate the processes of fitting historical 
data if available, prior elicitation, and computing 
posterior decision criteria thousands of times while 
simulating the trial during the design process. It of- 
ten is important to allow ir E (x,9) to be nonmono- 
tone in x, which may be appropriate for biological 
agents, such as viral vectors expressing cytokines 
aimed at triggering an immune response to kill tu- 
mor cells. This may be done very effectively by as- 
suming a simple quadratic r] E (x, 9) = /3e,o + %Pe,i + 
% 2 /3 E ,2, although other functions may be used. While 
t]t(x, 9) = [3t,o + x/3t,i with Pr(/3<r,i > 0) = 1 is ap- 
propriate for cytotoxic agents, in other settings a 
quadratic also may be used for rjr(x,6). For ex- 
ample, if "toxicity" includes infection and an anti- 
cancer agent also kills bacterial or fungal infections, 
then itt(x,9) may be nonmonotone and actually de- 
crease with higher x. 

Thall and Cook (2004) provide a penalized least 
squares method for establishing the prior p(0\£) based 
on elicited means and standard deviations (s.d.'s) 
crj^j of n y (xj , 9) for y = E,T and several doses x\,..., 
x m . Since each prior mean %,j(£) and s.d. cr yt j(£) of 
n y (xj,9) is a function of the fixed hyperparameters 
£ characterizing nonlinear least squares may 
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be used to solve for £ by minimizing the objective 
function 

m 



(7) 



y=E,Tj = l 

l 2 



+ c E {^-^i 5 

l<j'<fc<m 

where each at is a prior standard deviation in £. 
The second sum in (7) is included to limit the vari- 
ability among the prior s.d.'s, using a small penalty 
constant c > 0. 

3.2 Dose Admissibility and Efficacy- Toxicity 
Trade-offs 

The dose-finding algorithm relies on two different 
types of posterior decision criteria to choose x from 
a finite set of possibilities, X = {x^\ . . . ,x^}. The 
first criterion determines which doses are acceptable, 
and the second chooses the best acceptable dose. Let 
n E be a fixed lower limit on ir E (x, 9) and Wt a fixed 
upper limit on itt(x,9). The fixed limits are speci- 
fied by the physician. Let p* E and p^ both be fixed 
upper probability cut-offs, usually selected from the 
range 0.80 to 0.95. A dose x is unacceptable if it 
is likely to have either unacceptably low efficacy or 
unacceptably high toxicity, formally if 

Pr{ir E (x,9) <7£ E \V n } > p* E or 

Pr{7r T (x, 9) > 7f T \V n } > p* T . 

A dose x is acceptable if neither inequality in (8) 
holds. The set of acceptable doses in X based on 
V n is denoted by A n . These criteria are essentially 
those used by Thall, Simon and Estey (1995) as 
stopping rules in phase II trials, and the second cri- 
terion in (8) is used routinely for deciding whether 
to stop a phase I trial, for example, when using 
the CRM, if the lowest dose is too toxic. For exam- 
ple, when using EffTox, if p* E = p^p = 0.90, the rules 
in (8) are equivalent to saying that x is acceptable 
if Pt{tt e (x,9) > TL E \V n } > 0.10 and Pr{vr T (x,6») < 
KT\Dn} > 0.10. While, intuitively, these may seem 
like rather weak requirements, if one replaces 0.10 
by a large cut-off such as 0.80 by setting p* E = p^ = 
0.20 in (8), then the rules are nearly certain to stop 
any trial very quickly after a very small number of 
patients, due to the large variability of the poste- 
rior probabilities used in (8). This gets at the im- 
portant distinction between determining the accept- 
ability of x for the purpose of dose-finding with 



(8) 



small to moderate sample sizes, and the confirma- 
tory statement "x is safe and effective" formalized 
by inequalities such as Pv{tt e (x,9) > TL E \V, tl } > 0.95 
and Pi{ttt(x,9) < Tfr^n} > 0.95. Such confirma- 
tory conclusions are inappropriate based on early 
phase trial results since they can only be established 
convincingly by a large sample size, regardless of 
what the posterior probabilities may be. 

To describe the second decision criterion, for sim- 
plicity, I will focus on the bivariate binary case, 
where ir(x,0) G [0,1] 2 . To compare two acceptable 
doses, say, x^ and x^ 2 \ based on the posteriors 
of vr(x( fc ),0) = (ir E (x( k \9),ir T (x( k \e)) for k = 1,2, 
some method for reducing each pair 7r(x^ k \9) to a 
one-dimensional criterion is required, as inevitably is 
the case when a statistic of dimension > 2 is used for 
comparison. The EffTox method does this by formal- 
izing the idea that a higher risk of toxicity is a rea- 
sonable trade-off for a higher probability of achiev- 
ing anti-disease effect. The method first computes 
the posterior means fi^ n \x) = (^ E \x),^\x)) = 
(E{7r E (x,9)\V n },E{7r T (x,9)\V n }) for each x G X. 
Each p^ n \x) is then reduced to a one-dimensional 
criterion by the following geometric construction, 
which begins by eliciting several pairs of fixed proba- 
bilities, pV) = (p^ E \p?P), j = 1, . . . ,m, that the physi- 
cian considers equally desirable. A target curve, C, 
is fit to the elicited pairs, treating px as a monotone 
increasing function of p E , or, equivalently, reversing 
the roles of px and p E . This should be done using 
a graphical representation of C to provide a means 
for the physician to adaptively modify his/her target 
pairs. Given p G [0, l] 2 , let pc denote the point where 
the straight line segment in [0, l] 2 passing through 
p and the ideal point (1, 0) intersects C. The desir- 
ability of p may be defined as 



(9) 



5 (p) = exp 



Hp- (1,0)11 
Ipc -(i,o)|| 



where || • || denotes Euclidean distance. This has 
maximum 5(1, 0) = 1, with 8(p) decreasing as p moves 
away from (1, 0) along any straight line in [0, l] 2 . 
Several other definitions of 5(p) may be used, al- 
though (9) is reasonable and tractable. The contour 
of all p having desirability u is C u = {p G [0, l] 2 : 5(p) = 
u}, so that C = C e -i. Denote the set of real num- 
bers u such that C u ^ 4> by Re- Since 
C u nC v = 4>, the family {C u , u G Re} partitions [0, l] 2 . 
This construction is used to quantify the desirabil- 
ity of a dose x by evaluating (9) at p = /x^ n \x). To 
compare doses x^ and x^ 2 \ we compute ^ n \x^) 
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Fig. 2. Example of posterior means fi^ n \x^') = (E{7Te (x^ , 6)\data n }, E{tyt(x^' 1 ,6)\data„}) for two dose pairs as' 1 ' and 
x^ (denoted by round dots), given alone in the left-hand graph (Figure 2a), and with the addition of the target contour C 
constructed from elicited target points (denoted by x) and several resulting desirability contours (Figure 2b). 



and ^ n \x^), illustrated by the two round dots 
in Figure 2a, and then compute their desirabilities 
8(fj,( n \xW)) and 5(fL^ n \x^)), as shown in Figure 2b. 
The elicited pairs are represented by the symbol 
"x" in Figure 2b, which also shows C and several 
C u along with their numerical u values. During the 
trial, if no dose is acceptable, formally if An = 4>, 
then accrual is stopped with no dose selected; oth- 
erwise each cohort is given the dose x maximizing 
among all x E A n . This methodology has 
been used for dose-finding trials in acute stroke, 
treatment for GVHD in SCT, chemotherapy of acute 
leukemia, and anergized cells given post-transplant 
to accelerate immune reconstitution following allotx. 

It is important to consider the consequences of 
how one sets goals in the bivariate binary outcome 
case. Recall that i: a ^ = Pv(Ye = a,Yr = b) with tte = 
tti,i +tti,o and ttt = +7To,i. Several authors have 
proposed choosing x to maximize 7i"i,o, the prob- 
ability of the best possible outcome, efficacy and 
no toxicity, or the conditional probability ite\t c = 
TTi t o/(l — 7Tt). Unfortunately, most new treatments 
simply don't work that way. A new therapy that is 
either more aggressive, for example, a higher dose, 
or highly active biologically is likely to decrease 710,0 
and increase some combination of 71x0, 710,1 and nn. 
Treating it = (710,0, ^1,0, ^0,1, ^1,1) as fixed for sim- 



plicity, suppose that standard treatment gives out- 
come probability vector tt(°) = (0.50,0.10,0.30,0.10), 
which has marginals (tte, ttt) = (0.20, 0.40). Suppose 
that experimental treatment a^ 1 ' has 7rW = tt(x^) = 

(0.30,0.20,0.30,0.20), which has marginals (tt^, 

ttW) = (0.40,0.50), a doubling of tt^ from 0.20 to 

0.40 and a 25% increase in tt^ from 0.40 to 0.50. 
Suppose that a competing experimental treatment 

x^ has7r( 2 ) =n(x^) = (0.30,0.20,0.45,0.05), which 

(2) (2) 

has marginals (ir E ,7r^ ) = (0.25,0.50), a slight in- 
crease in 7r^ from 0.20 to 0.25 with the same in- 
crease in 7r^ as given by x^ 1 ). Since ^1,0(2;^^) = 
7r 10 (x( 2 )) = 0.20 and -k e \t-{x (1) ) = ir E \ T c{x^) = 0.40, 
a method based on either 7Ti,o or -ke\t c would con- 
sider and x^ to be equivalent. In contrast, the 
trade-off based method would consider x^- 1 ' superior 
to x (2) . 

4. FINDING PATIENT-SPECIFIC DOSES 

Thall, Nguyen and Estey (2008) generalized EffTox 
to account for patient heterogeneity by using the 
patient's vector Z = (Z±, . . . , Z„) of covariates ob- 
served at enrollment. The method requires historical 
data, H, to obtain an informative distribution on co- 
variate effect parameters for use in trial design and 
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conduct. The model and method account for dose 
effects, covariate effects and possible dose-covariate 
interactive effects on tte and ttt- The design assigns 
each patient a dose that is individualized based on 
the patient's Z vector. This is very different from 
conventional early phase trial designs, since (i) pa- 
tients with different covariates may receive different 
doses at the same point in the trial, (ii) the entry 
criteria may change adaptively, with the possibility 
that enrollment may be shut down for some patients 
but continued for others, and (iii) at the end of the 
trial a computer-based rule is provided for assigning 
each future patient's x based on his/her Z vector, 
rather than choosing a single dose for all patients. 

For designs with individualized treatment assign- 
ment rules utilizing Z (cf. Ratain et al., 1996; Babb 
and Rogatko, 2001), the ith patient's data are (iW, 
^^,7^), the probability model is elaborated by 
defining f(y\Z,x,0) for a patient with covariates 
Z who receives treatment x, and a n is a function 
of (Z,T> n ). To accommodate Z and historical data 
in the design described here, let r denote either a 
dose x in the trial or historical treatment from the 
set {n,...,T m }. The probability model given ear- 
lier is extended by defining the marginal probabil- 
ities, 7r fc (r, Z,0) = 3 _1 {r?jt(r, Z,9)}, k = E,T, for a 
patient with covariates Z given dose x, assuming 
linear terms of the general form 

m 

?? fc (r, Z, 9) = j3 k Z + ^2(fik,j + Ck,jZ)I(r = tj) 
i=i 

(10) 

+ {uj k (x, a k ) + 7fcZ}/(r = x) 

for k = E, T, where /3 k Z = fi^\Z\ H V /3k, q Z q ac- 
count for covariate main effects, ^ k Z = j k ,iZ± H h 

^k,qZq account for dose-covariate interactions, fik = 
(Hk,i, • • • j Mfe.m) are historical main treatment effects, 
£k,jZ = £k,j,iZi + • • • + £,k,j,qZ q account for covariate 
interactions with the jth historical treatment, and 
u>e(x, oce) and ujt(x, c*t) are the usual dose-outcome 
functions characterizing main dose effects, and oje 
and lot may be quadratic or linear functions of x, 
as given earlier. For fitting the historical data, (10) 
takes the form 

r] k (tj ,Z,9) = Hk,j + /3kZ + £k,jZ 

(11) 

for j = 1, . . . , m and k = E,T. 

For fitting the data obtained during the trial, (10) 
is 

r] k (x, Z, 9) = uj k (x, a k ) + (3 k Z + x^ k Z 

(12) 

for k = E, T. 



A much more parsimonious model that accounts 
for dose-covariate interactions is obtained by replac- 
ing x"fkZ in (12) with either 7fc{wfc(x, a k ) x (3kZ} or 
7fc{x x fyZ} where each 7^ is now a single parameter 
rather than a g-dimensional vector. This model re- 
quires only 2 dose-covariate interaction parameters 
instead of 2q. This is motivated by the idea that 
1kWk{x, ak) x PkZ} is similar to the one-degree of 
freedom interaction term in the model for a two- 
way layout with one observation per cell given by 
Tukey (1949). Unfortunately, in practice, this par- 
simonious model is a complete disaster since, using 
either ujk{x,a.k) or x, it gives a very poor fit to the 
trial data when dose-covariate interactions of any 
complexity are present. So this more parsimonious 
model is a cute idea that simply doesn't work. 

Generalizing the EffTox design to accommodate 
Z requires much more than writing down a model. 
The set A n (Z) of acceptable doses for a patient with 
covariates Z is defined to be all x € X satisfying the 
constraints 

Pt{-k e (x,Z,9) <n E (Z)\V n un} <p* E and 

(13) 

Pt{tt t (x, Z, 9) > 7f T {Z)\V n UH}< p* T , 

where ttb(^) an d Wt(Z) are acceptability bounding 
functions, constructed as follows. First, a represen- 
tative set of covariate vectors, IS 

determined. For each ZW, the physician specifies 

(i) 

the smallest probability of efficacy, tt e , and the 

largest probability of toxicity, -k^ , that are accept- 
able for a patient having those covariates. For k = 
E,T, denote Ck(Z) = E(f3kZ\T~L), the historical pos- 
terior mean of the covariate main effect linear com- 
bination. To construct the bounding function Tt E (Z) 

for 7Te(x,Z,9), the K pairs (Ce{Z^), 2L^ ), . . . , 

(£e(ZW), 7r^) of estimated linear terms and elicited 
lower bounds on tte are used as regression data to fit 
a simple linear or quadratic curve by least squares, 
using Ce(Z^) as the predictor and as the out- 
come variable. Denoting the estimated outcome un- 
der the fitted regression model by ^_ e {Ce{Z)), the 
efficacy lower bounding function is 7r s (Z) = n E o 
Qe{Z). The toxicity upper bounding function ttt(Z) = 
Wt Ct(Z) is computed similarly from ((^t(Z^), 

tt^), . . . , {CriZ^), Tf E ). When constructing these 
functions, it is important to plot the scattergrams of 
the constructed regression data sets along with the 
fitted curves, which the physician may use to guide 
adjustment of some tt_ e or 7r^! values, if desired, to 
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obtain acceptability bounding functions T£ E (Z) and 
ttt(Z) that make sense clinically. These construc- 
tions map each patient's Z vector into the probabil- 
ity bounds used in (13) to determine whether each 
x € X is acceptable for that patient. To define a 
covariate-specific dose desirability index, we evalu- 
ate 8(p) given by (9) at p = ^ n \x, Z) = (E{tte(x, Z, 
0)\V n },E{7TT{x, Z,9)\T> n }), and denote this by 
5 n (x,Z). For two patients with different covariates 
Z\ ^ Zii it may be the case that A n {Z\) ^ A n {Z2), 
including the possibility that A n (Z) = <f> for one pa- 
tient but not the other. Even if A n (Z\) = A n (Z2), 
the x that maximizes 8 n (x, Z±) may not be the same 
as that maximizing 6 n (x,Z2). 

Figure 3 illustrates how the dose-efficacy and dose- 
toxicity probability curves in x also may change 
with Z. The curves are given for a particular fixed 
Qtrue j n w j 1 j c ] 1 interactive effects of x and Z are 
substantial, taken from the acute leukemia applica- 
tion discussed by Thall, Nguyen and Estey (2008) 
where Z = (AGE, cytogenetic abnormality), with 
the second covariate coded three-category vari- 
able having possible values {Good, Intermediate, 
Poor} defined in terms of prognostic level. The rows 
in Figure 3 correspond to three different Z values. In 
the left column, the probabilities ite(x,6) are repre- 
sented by circles and ttt(x,8) by triangles, with an 
open (filled) circle or triangle representing an unac- 
ceptable (acceptable) dose. The corresponding desir- 
abilities are given in the right column, obtained by 
evaluating 5(tt e (x, Z, 9 true ), tt t (x, Z, 6 true )). The fig- 
ure shows that the dose-outcome functions ite(x,9) 
and ttt{x,6) may change dramatically with Z, that 
the effect of prognosis may be as large as or larger 
than that of dose, and that interactions between x 
and Z may be quite important. Figure 3 also illus- 
trates how the desirability function 5 reduces each 
two-dimensional (pe,Pt) to a one-dimensional value 
that may be used to compare doses for each Z. 

To apply this methodology, the first step is to 
analyze 1-L under several models, choose the model 
providing the best fit, compute p(j3,ij)\H), and de- 
termine noninformative priors on a and 7. During 
the trial, when a patient with covariates Z is en- 
rolled, A n (Z) is computed. If A n (Z) = (f), the patient 
is not treated on protocol. If A n (Z) ^ </>, the patient 
is treated with the dose x maximizing 5 n (x,Z). If 
A n (Z^) = (j) for all representative covariates, then 
the trial is stopped. After the trial, given final data 
T>n, the decision rules based on p{6\H U T>n) are 
used to select doses for future patients. 



Our computer simulation studies of this new 
methodology have produced some disquieting mes- 
sages. The first is that ignoring established prog- 
nostic covariates may lead to either very unsafe or 
very ineffective dose assignments for many patients 
both during and after a phase I or phase I/II trial. 
The second message is that, if dose-covariate inter- 
actions are present, ignoring them by using an addi- 
tive model for the effects of x and Z also may lead 
to very poor dose assignments. That is, the common 
practice of ignoring known patient heterogeneity in 
early phase trials may lead to bad science and bad 
clinical practice. 

5. ACCOUNTING FOR MULTIPLE 
TOXICITIES 

5.1 Outcomes and Model 

Bekele and Thall (BT, 2004) proposed a dose- 
finding method based on a vector Y = (Y±, . . . , Yj) of 
several qualitatively different types of toxicity, with 
Yj an ordinal variable recording the jth toxicity's 
severity. The method was motivated by a phase I 
trial to choose a dose of gemcitabine, in mg/m 2 , 
from {100, 1000} when combined with a fixed 
dose of 50 cGy external beam radiation, both given 
prior to surgery, for patients with soft tissue sar- 
coma. The design was developed working with a 
team of three oncologists who had extensive expe- 
rience treating sarcomas. The point of departure 
from conventional methods is that the design dis- 
tinguishes between different types of toxicity, and it 
also accounts for the severity levels of each. 

Denote the rrij + 1 severity levels of Yj by {yj$, Uj,i, 
■ ■ ■ ,Uj,mj}- For example, in the sarcoma trial the 4 
levels of liver toxicity were y^o = {grade or 1}, 
y jt i = {grade 2}, y j>2 = {grade 3} and y j)3 = {grade 4} 
Binary Yj corresponds to rrij = 1. Using standard- 
ized doses x = log {(raw dose)/1000}, so that X = 
{—2.30, —1.61, 0}, the distribution of Y\x was 
modeled using the method of Albert and Chib (1993), 
in terms of the J-vector of Gaussian latent variables 
C = (Cl,---,0) with E((j\x) = f3 ji0 + xf3j t i, var(Cj) = 
1 and correlation matrix O, by defining Yj = yjk if 
lj,k < Cj < 7j,k+i for fc = 0, 1, . . . , rrij and j = 1,...,J 
for cut-off parameters 7^ = (7^,1, • ■ • ,Jj, mj ) satisfy- 
ing -00 = 7 3 - )0 < 7j,i < • • • < jj >m . < Jj, mj +i = +00, 
with jj i = to ensure identifiability. This formula- 
tion greatly facilitates MCMC computations used to 
obtain posterior quantities. Denoting the 2J-vector 
of regression parameters (3 = (/?i, , • • • , Pj,o, At,i), 
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Fig. 3. Marginal outcome probabilities tte(x,9), given as circles, and ttt{x,9), given as triangles (left column) and corre- 
sponding desirabilities (right column) for each of three different patient prognostic vectors, under a fixed 6. Solid (open) points 
correspond to acceptable (unacceptable) doses. 



the vector 7 = (71, ... , jj) having m + = mi + • • • + 
rrij entries, and the J {J — l)/2 off-diagonal elements 
of Q by p = (pi, 2 , Pi,3, • ■ • , Pj-i,j), the model param- 
eter vector is 9 = (f3, 7, p). The marginal distribution 
of Yj\x is given by 

ir jt k(x,8) = Pr(Yj = y j:k \x,6) 

(14) = $(7j,fc+i - #7,0 - 

To obtain an expression for the joint distribu- 
tion, denote the p.d.f. of a multivariate normal ran- 
dom vector W with mean vector p and variance- 



covariance matrix £ by (j)w{'\^, £)■ In matrix no- 
tation, E(C|x) = Xf3', where X is the J x 2J block 
diagonal matrix with J identical blocks (1 x ) . De- 
note the intervals Gj^ = (7j,k,7j,fe+i]- F° r observed 
vector k = (ki, . . . ,kj) of toxicity severity levels, the 
outcome is Y = y{k) = (yi.fa, . . -,yj,kj), which cor- 
responds to latent £ values in the J-dimensional set 
G(/c,7) = Gifa x • • • x G.J,kj- A single patient's like- 
lihood contribution is 



c(Y\x,e) 

mi mj 

as) =n n 



fc 1=0 kj=0 UG ( fc 'T) 



<Pz(z\Xf3', 
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1 i[Y=y(k)] 

For priors, BT assume /3~A r (^,S), subject to 
Pr(/3j- 1 > 0) = 1 for all j = 1, ... , J, so that p is 2J- 
variate normal with all slope coefficients truncated 
below at 0, but \i and X correspond to the untrun- 
cated 2J-variate normal. This ensures that Pr(Yj > 

Uj,k\ x >P) = 1 ~~ ^YYj,k ~~ Pj,o ~ increases with 

x for each j and k > 1. For each j with > 2 
(3 or more levels), {7^,2 , • • • >7j,m,} follow indepen- 
dent, uninformative priors on the domain [0,10], 
with each pfaj^) oc 1, subject to the constraint < 
7j,2 < Jj,3 < • ■ ■ < 1j,Cj , where the upper limit 10 on 
the support of each pi^fj^) was chosen for numeri- 
cal convenience. The Pj,k's are assumed to be i.i.d. 
N{0, 1000), truncated to have support [—1, +1], with 
f2 positive definite. 

5.2 Total Toxicity Burden and Trial Conduct 

The dose-finding method is based on toxicity sever- 
ity weights, elicited as follows. The oncologists are 
first asked to specify the J toxicities to be moni- 
tored, including the severity levels of each. They are 
then asked to specify a numerical severity weight for 
each level of each toxicity within a positive-valued 
numerical range with which they are comfortable, 
such as to 10, or to 100. The severity weights 
are denoted by w = (w\, . . . ,wj), where Wj = (wj t o, 
Wj 7 i, . . . , Wj^ mj ) are the severity weights of the possi- 
ble values (yj t o,yj,i, ■ ■ ■ ,yj,mj) of Yj, with the obvi- 
ous requirement Wj t o < Wj t i < • • • < Wj >m . ; otherwise, 
if Wj t k = Wj t k+i, then levels k and k + 1 of Yj should 
be combined. The elicited severity weights used in 
the sarcoma trial are illustrated in Figure 4. An in- 
teresting practical point arose while assigning sever- 
ity weights to myelosuppression, which is defined in 
terms of low blood cell counts and is caused by ef- 
fects of chemotherapy on the bone marrow. At first, 
no distinction was made between myelosuppression 
occurring either with or without fever. During the 
process of establishing w, however, the oncologists 
explained that myelosuppression is much more se- 
vere when it occurs with fever, since it is then life- 
threatening and may be an impediment to further 
chemotherapy. This led us to redefine myelosuppres- 
sion occurring without or with fever as two different 
types of toxicity. Figure 4 shows the severity weights 
1 and 1.5 for grade 3 and 4 myelosuppression with- 
out fever, compared to weights 5 and 6 for grade 



3 and 4 myelosuppression with fever. Thus, in gen- 
eral, Y and w are elicited together, and this process 
is not unlikely to involve iteration. 

A patient's total toxicity burden (TTB) is defined 
to be 

j mj 

(16) TTB = Y,J2 w J,kKYj = Vj,k)- 

j=i k=i 

For example, from Figure 4, a patient with grade 3 
fatigue, grade 3 nausea/vomiting and grade 4 myelo- 
suppression without fever would have TTB = 0.5 + 
1.5 + 1.5 = 3.5, whereas a patient with grade 4 myelo- 
suppression with fever would have TTB = 6.0. Using 
the conventional approach of defining a single binary 
outcome Y indicating at least one grade 3 or 4 toxi- 
city, these two patients would be scored identically, 
with both having Y = 1. 

The posterior expected TTB of dose x is 

T(x,V n ) = E(TTB\x,V n ) 

(17) 

j=l fc=l 

The trial is conducted by establishing a targeted 
total toxicity burden, TTB*, and choosing each co- 
hort's x to minimize |r(z,2? n ) — TTB*\. This is anal- 
ogous to choosing a dose, based on a binary Y with 
7r(x, 6) = Pr(y = l\x, 0), using the CRM criterion to 
minimize |E{7r(x, 6)\D n } — ir*\ for given target prob- 
ability 7r*. It is easy to show that, since Wj^-i < Wj,k 
and f3j t i > for all j and k, T(x,T> n ) is increasing in 
x, so x may be determined by a monotone search. 
The process proposed by BT for establishing the 
target TTB* is straightforward, albeit somewhat 
elaborate. The physicians are first asked to spec- 
ify a set of hypothetical patient cohorts and toxic- 
ity outcomes for each patient in each cohort, with 
the cohorts defined so that the toxicity severities 
vary substantially between cohorts. BT provide a 
detailed description of this process, and in the sar- 
coma trial there were 16 hypothetical cohorts of 4 
patients each with the mean TTB of each cohort 
varying from TTB = 1.25 to 5.62. For each hypo- 
thetical cohort, the oncologists are asked whether 
observing its toxicity outcomes would lead them to 
escalate, repeat the current dose, or de-escalate for 
the next cohort. The target TTB* is then defined as 
the mean of the TTB values for which the decision 
would be to repeat the current dose. For the sarcoma 
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Elicited Toxicity Severity Weights 



Fatigue Grade 4 
Fatigue Grade 3 
Naus./Vomiting Grade 4 
Naus./Vomiting Grade 3 
Liver Toxicity Grade 4 
Liver Toxicity Grade 3 
Liver Toxicity Grade 2 
Dermatitus Grade 4 
Dermatitus Grade 3 
Myelosup w/o Fever Grade 4 
Myelosup w/o Fever Grade 3 
Myelosup w/Fever Grade 4 
Myelosup w/Fever Grade 3 



Grade 4 



1 2 

□ Grade 3 



4 5 

Grade 2 



Fig. 4. TTie elicited toxicity severity weights used in the soft tissue sarcoma trial. 



trial, this yielded TTB* = 3.04. Computer simula- 
tions of this methodology provided by BT show that 
it has remarkably attractive OCs and makes deci- 
sions very differently from conventional phase I de- 
signs. A cohort of four patients all with myelosup- 
pression grade 4 without fever, patients #1, # 2 
and # 3 with grade 3 fatigue, and patient #4 with 
grade 3 nausea/vomiting would have TTB = {(1.5 + 
0.5) + (1.5 + 0.5) + (1.5 + 0.5) + (1.5 + 1.5)}/4 = 2.25. 
The three oncologists all agreed that the appropriate 
decision based on these outcomes would be to esca- 
late, whereas any conventional method based on one 
binary toxicity indicator would score this as 4 "toxi- 
cities" in 4 patients and certainly would de-escalate. 

6. OPTIMIZING DOSE AND SCHEDULE 

6.1 A New Paradigm for Phase I Trials 

Braun et al. (BTND, 2007) proposed a new 
paradigm for phase I trials that jointly optimizes 
schedule of administration and per-administration 
dose (PAD) based on time-to-toxicity. This extends 
Braun, Yuan and Thall (2005), who optimized sched- 
ule while assuming a fixed PAD. Although the model 
used by BTND is very different from that underlying 
the TiTE CRM (Cheung and Chappell, 2000) for 
dose-finding based on time-to-toxicity, the BTND 



method is a practical extension in that it allows 
schedule as well dose to be varied. The treatment 
regime is x = (s, d s ), where s = (si, . . . , Sk) are suc- 
cessive administration times and d s = (d(s\), . . . , d(sk)) 
are the doses given at those times. BTND address 
the problem of evaluating a K x J matrix of K 
nested schedules, s^ 1 ' C s^ C • • • C s^ K \ where the 
kth schedule is s^ = (s\, S2, ■ ■ • , s m (k)), so that mS 1 ' < 

<■■■< and J PADs, d« < d^ <■■■< 

d( J \ The treatment set evaluated by the design is 
X = {( s ( k \d^):k = l,...,K,j = 1,...,J}, and the 
total amount of the agent given to the patient in- 
creases with both dose and schedule. For example, 
a patient assigned PAD d^ under schedule s^ = 
(s\, S2, ■ ■ ■ , s m (2)) receives total dose d^rrS 2 ^ of the 
agent in successive administrations of d^ each, 
unless therapy is terminated early due to toxicity, so 
the planned d s (2) in x = (s^ , d s (2) ) is the m( 2 )-vector 
with all entries d^ . 

For this regime, it is helpful to distinguish between 
two time scales, study time and patient time. Start- 
ing at study time when the trial begins, let e be a 
given patient's entry time, so that the patient's as- 
signed schedule s is administered at study times e + 
s = (e + si, . . . , e + Sk) - Denote a patient's time from 
entry at e to toxicity by T, so that at study time t 
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the patient's observed time to toxicity or last follow- 
up is T°(t) = T if e + T < t and T°(t) =t-eife + 
T>t. Denning 5(t) = I(e + T<t), the patient's out- 
come data at study time t are Y(t) = (T°(t),5(t)). 
The probability model is constructed from the pa- 
tient's hazard of toxicity, h(u\d,9), associated with 
a single administration of dose d given u days pre- 
viously, and we denote H(x\d, 9) = f£ h(u\d,0) du. 
Under the assumption that effects of successive ad- 
ministrations of the agent are additive, the overall 
hazard of toxicity at study time t for a patient en- 
tering at e and treated with x = (s, d s ) is 



(18) X(t\e,(s,d s ),9) 



k 

£ 



h(t 



Sj\d(sj),6), 



BTND assume that, for each j = 1, . . . , J, the single- 
administration hazard function associated with dose 
d^ is a triangle, formally 



where h(u\d,9) = for all u < 0. The patient's cu- 
mulative hazard of toxicity at study time t is thus 

k 

(19) A(i|e, (s,d s ),0) = Y J H{t-e- Sj \d( Sj ),9), 

i=i 

and the probability that the patient has not had tox- 
icity by study time t is Pr(e + T > t\e, (s,d s ),9) = 
exp{— A(t|e, (s, d s ),9}. Thus, h and H are expressed 
in terms of patient time, whereas A and A are ex- 
pressed in terms of study time. The probability dis- 
tribution of T is determined by the particular form 
of the single administration hazard function h. 

The model allows each patient's actual x = (s, d s ) 
received to fall outside the set of KJ treatment con- 
figurations in X, provided that each dose in d s is 
an element of {d^\ . . . ,d^}. In particular, the el- 
ements of d s need not be identical. This accommo- 
dates the possibility that a patient's treatment does 
not go as planned, for example, due to interim dose 
reductions following moderate toxicity or deviations 
from the planned schedule. It also allows the possi- 
bility that the patient's planned x may be changed 
before the schedule is completed, based on other pa- 
tients' data observed during the patient's therapy. 

At study time t, let Xi(t) denote the portion of the 
iih patient's treatment regime Xi that has been ad- 
ministered by that time and let V t = {(T°(t), Si(t), 
ei, Xi(t)),i = 1, . . . , n(t)} denote the current data. The 
likelihood at study time t is 



n(t) 



C(V t \9) = H{\(T°(t)\e uXi (t),9)} 



Si(t) 



(20) 



i=i 



h(u\d( j \9j) 



2a jU 

I(0<u<Pj) 



(21) 



(&+7i)& 

2aj(J3j + ij -u) 



+ 



■I(f3 j <u</3 j + lj ), 

where 9j = (aj, Pj,"fj), so that 9 = (9i, . . . ,9j) has 
3 J elements. The jth triangle has base of length (3j + 
7j, area aj, and maximum height h(f3j\d^\9j) = 
2a j/ (/3 j + 7j). Thus, for u > f3j + jj, the cumulative 
single-administration hazard is Hj(u\d^\9j) =otj. 
Under this model, since any schedule (si, . . . , s&) is 
finite, given 9 and k- vector of PADs d s = (d^ , . . . , 
d^), the cumulative hazard A(t\(s , d s ) , 9) has the 
finite maximum value kaj. Consequently, given 9, 
the probability that the patient never experiences 
toxicity is F(t\(s,d s ),9) = exp(-kaj) for all t > s& + 
f3j + 7j, with the obvious elaboration of the upper 
limit on F (t\(s , d s ) , 9) if the elements of d s are not 
identical. 

The triangular form of h may seem to be an over- 
simplification of a complex phenomenon. In appli- 
cation, however, it is quite flexible and yields a very 
robust trial design. Figure 5 shows the cumulative 
hazard of toxicity for a patient treated with a fixed 
PAD according to the 4-administration schedule s = 
(0,3, 10, 13). The shaded area represents if (12), the 
cumulative hazard of toxicity by day 12. The smooth- 
ness of the curve H (u) for < u < 40 that results 




10 13 



37 40 



Time (in days) 



•exp{-A(I^)| e i,Xi(t),0)}. 



Fig. 5. Illustration of triangular component hazards for ad- 
ministrations on days 0, 3, 10 and 13, and the resulting cu- 
mulative hazard function. The shaded region is the cumulative 
hazard, H(Y2), of toxicity by day 12. 
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from summing four triangles and the fact that the 
parameters (ay , /3j , jj ) characterizing the j th trian- 
gle corresponding to d^ are estimated from the ac- 
cumulating data together provide an intuitive mo- 
tivation for the model's flexibility and robustness. 
This was borne out by the extensive simulation stud- 
ies reported by BTND. In the setting where only 
schedule is varied with PAD fixed, Liu and Braun 
(2009) have studied the use of a smooth component 
hazard function, a 2-parameter Weibull, h(u\a,(3) = 
e^au a ~ l exp(— u a e^) for u > 0, where a > and /3 
is real-valued. This allows h to be nonmonotone if 
a > 2 or decreasing if < a < 2. 

6.2 Trial Conduct 

Given an interval [0,t*] large enough to reliably 
evaluate T under the longest schedule, the physi- 
cian specifies a target tt* = Pr(T < t*). For brevity, 
I will temporarily index a patient's assigned treat- 
ment (s( k \d^) by (k,j), and denote the c.d.f. of 
T associated with x = (k,j) by F^j{9) = Pr(T < 
t*\(k,j),9). Since C{V t \6) and hence the posterior 
p(9\T>t) change continuously with t during the trial, 
necessarily, x € X is chosen for each newly enrolled 
patient, that is, c = 1. A patient accrued at study 
time t is assigned the pair (k,j) € X minimizing the 
objective function \Fi{F/ : j(9)\T>t} — tt*\, similar to 
the CRM. Assignment of x using this criterion is 
subject to the following two safety rules. Given a 
maximum toxicity probability, F max , specified by 
the physician, the schedule-dose pair (k,j) is ac- 
ceptable if Pr(Ffcj (9) > F max \T> t ) < p*, where p* is a 
fixed upper cut-off such as 0.80 or 0.90. If no pair in 
X is acceptable, the trial is stopped. This is similar 
to the toxicity portion of the acceptability criteria 
(8) of the EffTox methodology. The second safety 
rule is that escalation from (k,j) is restricted in 
that no untried dose-schedule combination may be 
skipped, specifically the next patient may be treated 
at x = (k + 1, j), (k,j + 1) or (k + l,j + 1), but at 
no higher pair. There is no such constraint on de- 
escalation. While developing this methodology, we 
initially tried using the more restrictive constraint 
that does not allow diagonal escalation, from (k,j) 
to untried (k + 1, j + 1), but this yielded a design 
with very poor properties. This is the case essen- 
tially because this constraint makes exploration of 
the 2-dimensional set of KJ schedule-PAD pairs un- 
feasible, and, in fact, it provides no additional mea- 
sure of safety. While BTND recommended that the 
first patient be treated at the safest pair (k,j) = 



(1, 1), in practice, the physician might wish to start 
at (1,2), (2,1) or (2,2). 

It may seem self-evident that this method is greatly 
superior to any comparable method that fixes sched- 
ule and only varies dose, since an optimal combina- 
tion (s^ k \d^) is simply ignored if the fixed sched- 
ule is not s( k \ The simulations reported by BTND 
clearly illustrate this point. Currently, however, it 
still is standard practice in phase I trials to guess 
what schedule might be best, possibly based on an- 
imal data, and proceed in humans by varying only 
dose. As described by BTND, this new methodology 
has been used to conduct an allotx trial of the post- 
transplant agent 5-azacitidine, which is thought to 
kill leukemia cells by reactivating tumor suppressor 
genes while also enhancing graft-versus-leukemia ef- 
fect. 

7. DISCUSSION 

Each of the designs reviewed here includes one or 
more more aspects of treatment or outcome in an 
early phase trial that are ignored by standard de- 
signs. The price of accommodating such additional 
complexity is a much more structured model and 
method. This often requires substantially more work 
for trial design and conduct, including analysis of 
historical data, elicitation of priors and design pa- 
rameters, development of computer software, car- 
rying out simulations to calibrate design parame- 
ters and establish operating characteristics, and the 
difficult process of real time data monitoring dur- 
ing trial conduct. In each case, however, the design 
provides advantages over standard methods so large 
that the comparisons may seem unfair. Evidently, 
accounting for both anti-disease effect and toxicity 
is a good idea, ignoring covariates is a bad idea, 
quantifying the clinical importance of different types 
and grades of toxicities is a good idea, and ignoring 
schedule effects is a bad idea. For example, the op- 
timal treatment pair determined by the design in 
the 5-azacitidine trial was (40 mg/m 2 per adminis- 
tration, 3 cycles), which simply could not have been 
found using a conventional dose-finding method that 
fixes schedule at 1 cycle. While each of the designs 
relies on a model with a nontrivial number of param- 
eters, which in turn often requires elaborate prior 
specification and sophisticated numerical methods, 
the amount of information per patient also is much 
greater. The final questions are whether such designs 
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have good properties, which the computer simula- 
tions show they do, and whether they can be imple- 
mented in practice, which has been the case for all 
of the designs discussed here. 

For most early phase trials, conventional meth- 
ods for determining sample size based on hypothesis 
testing or estimation may be of little use. To deter- 
mine a planned maximum sample size for a phase 
I or I /II trial, I ask the physicians the anticipated 
accrual rate, which often is a range of values, the 
desired maximum trial duration, and cost or other 
resource limitations, such as the amount of a spe- 
cialized agent that feasibly can be produced in the 
laboratory. For each of several feasible maximum 
sample sizes, I simulate the trial and also compute 
posterior estimates of important parameters based 
on illustrative data sets. I then show these results to 
the physician and ask him/her to choose a maximum 
sample size on that basis. If the largest feasible sam- 
ple size does not yield a reasonably reliable design, 
I recommend that the trial not be conducted. Using 
this practical approach, in my experience planned 
phase I or I/II sample sizes usually range from 24 
to 60. 

The most severe difficulties in achieving widespread 
implementation of outcome-adaptive methods in early 
phase clinical trials are computational and sociologi- 
cal. The first practical requirement is portable, high 
quality computer software for implementation, in- 
cluding statistical programs that perform the neces- 
sary computations for specific methods and, ideally, 
graphical user interfaces that communicate with es- 
tablished databases and statistical programs to facil- 
itate real-time data entry and computation of adap- 
tive decision criteria during the trial. An "elephant 
in the living room" of outcome-adaptive methods for 
clinical trials is that constructing and implementing 
such information systems in medical environments 
is often much more difficult, expensive and time- 
consuming than developing a particular statistical 
method. Moreover, once such a system is in place, 
the process of entering the patient data required by 
an outcome-adaptive method is time-consuming and 
potentially error prone. 

A natural question is whether one can construct 
practical designs that address the problems that arose 
in the SCT trial described in the Introduction. Such 
designs would optimize multiple schedule-dose com- 
binations of several agents used in combination based 
on a vector of appropriately chosen efficacy and tox- 
icity outcomes, possibly accounting for patient co- 



variates and schedule-dose-covariate interactive ef- 
fects on the outcomes, while adaptively choosing 
patient-specific schedule-dose combinations in real 
time. This also would require a decision criterion 
based on multiple outcomes, possibly using either 
efficacy-toxicity trade-offs or numerical utilities 
(Houede et al., 2010). Currently, we are working to 
develop new designs that include various combina- 
tions of these extensions. In my experience, how- 
ever, early phase trials are so complex that most 
trials cannot be optimally designed until after they 
already have been carried out, and a "one size fits 
all" design simply does not exist. 

Clinical trials are viewed very differently by the 
pharmaceutical companies who produce and supply 
new agents, by regulatory agencies, by institutional 
review boards, by administrators who provide in- 
frastructure and resources for trial conduct, by the 
physicians and nurses who actually treat the pa- 
tients in a trial, and by the patients themselves. In- 
dividuals with decision-making authority in all of 
these different groups must agree on a trial design 
before a trial may be conducted. Many of these peo- 
ple regard the structure and properties of a partic- 
ular statistical design as technicalities too compli- 
cated to understand and at most marginally rele- 
vant. Most early phase trials are conducted using 
very simple conventional methods that do not re- 
quire computers and are easy to implement. Ac- 
counting more fully for the complexities of both the 
actual treatment regimes and the patients' clinical 
outcomes is a double edged sword, since the greater 
safety and reliability that such methods provide is 
obtained only by working much harder in both de- 
sign formulation and trial conduct. Physicians who 
understand the advantages of properly constructed 
outcome-adaptive designs and want to use such meth- 
ods are a minority, although they often provide the 
initial motivation for developing new statistical de- 
signs. Their desire to use outcome-adaptive meth- 
ods, and the recent shift in the pharmaceutical com- 
munity, at least among statisticians, to embrace all 
things "adaptive" in clinical trials seem to be 
harbingers of a different future. How this may ac- 
tually translate into practical reality in the coming 
years remains to be seen. 

The website http: / /biostatistics. mdanderson.org/ 
SoftwareDownload contains computer programs for 
implementing the methods described in Sections 2 
(ToxFinder), 3.1 and 3.2 (EffTox) and 5 (Dose Sched- 
ule Finder). 



18 



P. F. THALL 



ACKNOWLEDGMENT 

This research was partially supported by NCI Grant 
2R01 CA083932. 

REFERENCES 

Albert, J. H. and Chib, S. (1993). Bayesian analysis of bi- 
nary and polytomous response data. J. Amer. Statist. As- 
soc. 88 669-679. MR1224394 

Andersson, B. S., Thall, P. F., Madden, T., Couriel, 
D., Wang, X., Tran, H. T., Anderlini, P., de Lima, 
M., Gajewski, J. and Champlin, R. E. (2002). Busulfan 
systemic exposure relative to regimen-related toxicity and 
acute graft vs. host disease; defining a therapeutic window 
for IVBuCy2 in chronic myelogenous leukemia. Biology of 
Blood and Marrow Transplantation 8 477-485. 

Babb, J. S. and Rogatko, A. (2001). Patient specific dosing 
in a phase I cancer trial. Stat. Med. 20 2079-2090. 

Bekele, B. N. and Thall, P. F. (2004). Dose-finding based 
on multiple toxicities in a soft tissue sarcoma trial. J. Amer. 
Statist. Assoc. 99 26-35. MR2061885 

Bekele, B. N., Ji, Y., Shen, Y. and Thall, P. F. (2008). 
Monitoring late onset toxicities in phase I trials using pre- 
dicted risks. Biostatistics 9 442-457. 

Braun, T. (2002). The bivariate continual reassessment 
method: Extending the CRM to phase 1 trials of two com- 
peting outcomes. Controlled Clinical Trials 23 240-256. 

Braun, T. M., Yuan, Z. and Thall, P. F. (2005). Deter- 
mining a maximum tolerated schedule of a cytotoxic agent. 
Biometrics 61 335-343. MR2140904 

Braun, T. M., Thall, P. F., Nguyen, H. and de Lima, 
M. (2007). Simultaneously optimizing dose and schedule of 
a new cytotoxic agent. Clinical Trials 4 113-124. 

Cheung, Y. K. (2005). Coherence principles in dose-finding 
studies. Biometrika 92 863-873. MR2234191 

Cheung, Y. and Chappell, R. (2000). Sequential designs for 
phase I clinical trials with late-onset toxicities. Biometrics 
56 1177-1182. MR1815616 

Dette, H., Bretz, F., Pepelyshev, A. and Pinhiero, J. 
(2008). Optimal designs for dose-finding studies. J. Amer. 
Statist. Assoc. 103 1225-1237. MR2462895 

Goodman, S. C, Zahurak, M. L. and Piantadosi, S. 
(1995). Some practical improvements in the continual re- 
assessment method. Stat. Med. 14 1149-1161. 

Gooley, T. A., Martin, P. J., Fisher, L. D. and Pet- 
tinger, M. (1994). Simulation as a design tool for phase 
I/II clinical trials: An example from bone marrow trans- 
plantation. Controlled Clinical Trials 15 450-462. 

Haines, L. M., Perevozskaya, I. and Rosenberger, W. 
F. (2003). Bayesian optimal designs for phase I clinical 
trials. Biometrics 59 591-600. MR2004264 

Horstmann, E., McCabe, M. S., Grochow, L., Yama- 
moto, S., Rubinstein, L., Budd, T., Shoemaker, D., 
Emanuel, E. J. and Grady, C. (2005). Risks and benefits 
of phase 1 oncology trials, 1991 through 2002. New England 
J. Medicine 352 895-904. 

Houede, N., Thall, P. F., Nguyen, LL, Paoletti, X. and 
Kramar, A. (2010). Utility-based optimization of combi- 



nation therapy using ordinal toxicity and efficacy in phase 
I /II trials. Biometrics. In press. 

Ivanova, A. (2003). A new dose-finding design for bivariate 
outcomes. Biometrics 59 1001-1007. MR2025124 

Liu, C. A. and Braun, T. M. (2009). Parametric non- 
mixture cure models for schedule-finding of therapeutic 
agents. Appl. Statist. 58 225-236. 

Morita, S., Thall, P. F. and Muller, P. (2008). De- 
termining the effective sample size of a parametric prior. 
Biometrics 64 595-602. MR2432433 

O'Quigley, J. (1990). Sequential design and analysis of dose- 
finding studies in patients with life threatening disease. 
Fund. Clin. Pharmacology 4 (Suppl. 2) 81s-91s. 

O'Quigley, J., Hughes, M. D. and Fenton, T. (2001). 
Dose-finding designs for HIV studies. Biometrics 57 1018- 
1029. MR1973811 

O'Quigley, J., Pepe, M. and Fisher, L. (1990). Continual 
reassessment method: A practical design for phase I clinical 
trials in cancer. Biometrics 46 33-48. MR1059105 

Palmer, C. R. (2002). Ethics, data-dependent designs, and 
the strategy of clinical trials: Time to start learning-as-we- 
go? Stat. Methods Med. Res. 5 381-402. 

Ratain, M. J., Mick, R., Janisch, L., Berezin, F., 
Schilsky, R. L., Vogelzang, N. J. and Kut, M. (1996). 
Individualized dosing of amonafide based on a pharma- 
codynamic model incorporating acetylator phenotype and 
gender. Pharmacogenetics 6 93-101. 

Thall, P. F. and Cook, J. D. (2004). Dose-finding based 
on efficacy-toxicity trade-offs. Biometrics 60 684-693. 
MR2089444 

Thall, P. F. and Russell, K. T. (1998). A strategy for 
dose finding and safety monitoring based on efficacy and 
adverse outcomes in phase I/II clinical trials. Biometrics 
54 251-264. 

Thall, P. F., Cook, J. D. and Estey, E. H. (2006). Adap- 
tive dose selection using efficacy-toxicity trade-offs: Illus- 
trations and practical considerations. J. Biopharm. Statist. 
16 623-638. MR2252311 

Thall, P. F., Nguyen, H. and Estey, E. H. (2008). 
Patient-specific dose-finding based on bivariate outcomes 
and covariates. Biometrics 64 1126-1136. 

Thall, P. F., Lee, J. J., Tseng, C.-H. and Estey, E. H. 
(1999). Accrual strategies for phase I trials with delayed 
patient outcome. Stat. Med. 18 1155-1169. 

Thall, P. F., Millikan, R. E., Muller, P. and Lee, S.-J. 
(2003). Dose-finding with two agents in phase I oncology 
trials. Biometrics 59 487-496. MR2004253 

Thall, P. F., Simon, R. and Estey, E. H. (1995). Bayesian 
sequential monitoring designs for single-arm clinical trials 
with multiple outcomes. Stat. Med. 14 357-379. 

Tukey, J. W. (1949). One degree of freedom for non- 
additivity. Biometrics 5 232-242. 

Whelan, H. T., Cook, J. D., Amlie-Lefond, C. M., Hov- 
inga, C. A., Chan, A. K., Ichord, R. N., deVeber, G. 
A. and Thall, P. F. (2008). Practical model-based dose- 
finding in early phase clinical trials: Optimizing tissue plas- 
minogen activator dose for treatment of ischemic stroke in 
children. Stroke 39 2627-2636. 



